Motivations and Methods for Text Simplification

نویسندگان

  • Raman Chandrasekar
  • Christine Doran
  • Srinivas Bangalore
چکیده

Lottg alld eolni)licated seltteltces prov(: to b(: a. stumbling block for current systems relying on N[, input. These systenls s tand to gaill frolil ntethods that syntacti<:aHy simplily su<:h sentences. ']b simplify a sen= tence, we neelex text can be made simph'x, senten(-es beconae easier to process, both for In:Og r a m s and humans . Wc discuss a s impl i f icat ion process which identif ies componen t s of a sentence t ha t may be separa ted out, and t r ans fo rms each of these into f r e c s t a , d i n g s imple r sentences. (]learly, some mmnees of mean ing from the original tex t m a y be lost in the s impl i f ica t ion process. S impl i t ica t ion is theretbre i n a p p r o p r i a t e for tex ts (such as legal docunlents ) where it is impor ta .n t not to lose any nuance. I |owew;r, one c.~tl] COilceive of several areas of na tu r a l l anguage processing where such s impl i t ica t ion would be of g rea t use. This is especial ly t rue in do lna ins such as Inachine t rans la t ion , which c o m m o n l y have a manua l pos t -process ing stage, where seman t i c and pragma t i c repairs m a y be <'arried out if ne<;essary. • Pars ing: Syn tac t i ca l ly <:omplex sentence's arc likely to genera te a large number of parses , and may cause parsers to fail a l toge ther . Resolving ambigu i t i e s in a t t a c h m e n t of cons t i tuen t s is nont r iv ia l . Th is ambiguii , y is reduced for s impler sentences sin<'e they involve fewer cons t i tuents . 'Fhus s imple r sentences lead to faster pars ing and less parse aml)iguity. Once the i>arses for the s imple r sentences are ob ta ined , the subparses can be assembled to form a full parse, or left as is, depend ing on the app l ica t ion . • Machine Trans l a t i on (MT): As in the parsing case, s impl i f ica t ion resul ts in s impler scnten t ia l s t ruc tures and reduced ambigu i ty . As argued in (Chandraseka r , 1994), this conld lead to improvemen t s in the qua l i ty of machine t r ans la t ion . • I n fo rma t ion Retr ieval : IR sys tems usua l ly retr ieve large s e g m e n t s of tex ts of which only a pa r t n]ay bc reh~'wml,. Wi t | , s impl i f ied texts , it is possible to ex t rac t Sl>eCific phrases or s imple sentences of relevance in response to queries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Reducing Lexical and Syntactic Complexity of Texts on Reading Comprehension

The present study investigated the effect of different types of text simplification (i.e., reducing the lexical and syntactic complexity of texts) on reading comprehension of English as a Foreign Language learners (EFL). Sixty female intermediate EFL learners from three intact classes in Tabarestan Language Institute in Tehran participated in the study. The intact classes were assigned to three...

متن کامل

Extraction of Drug-Drug Interaction from Literature through Detecting Linguistic-based Negation and Clause Dependency

Extracting biomedical relations such as drug-drug interaction (DDI) from text is an important task in biomedical NLP. Due to the large number of complex sentences in biomedical literature, researchers have employed some sentence simplification techniques to improve the performance of the relation extraction methods. However, due to difficulty of the task, there is no noteworthy improvement in t...

متن کامل

Text Simplification Tools for Spanish

In this paper we describe the development of a text simplification system for Spanish. Text simplification is the adaptation of a text to the special needs of certain groups of readers, such as language learners, people with cognitive difficulties and elderly people, among others. There is a clear need for simplified texts, but manual production and adaptation of existing texts is labour intens...

متن کامل

Sentence Alignment Methods for Improving Text Simplification Systems

We provide several methods for sentencealignment of texts with different complexity levels. Using the best of them, we sentence-align the Newsela corpora, thus providing large training materials for automatic text simplification (ATS) systems. We show that using this dataset, even the standard phrase-based statistical machine translation models for ATS can outperform the state-of-the-art ATS sy...

متن کامل

Building a Monolingual Parallel Corpus for Text Simplification Using Sentence Similarity Based on Alignment between Word Embeddings

Methods for text simplification using the framework of statistical machine translation have been extensively studied in recent years. However, building the monolingual parallel corpus necessary for training the model requires costly human annotation. Monolingual parallel corpora for text simplification have therefore been built only for a limited number of languages, such as English and Portugu...

متن کامل

Optimizing Statistical Machine Translation for Text Simplification

Most recent sentence simplification systems use basic machine translation models to learn lexical and syntactic paraphrases from a manually simplified parallel corpus. These methods are limited by the quality and quantity of manually simplified corpora, which are expensive to build. In this paper, we conduct an indepth adaptation of statistical machine translation to perform text simplification...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996